HyperTraPS (hypercubic transition path sampling) is a family of algorithms for inferring the dynamics of “accumulation” processes. These are processes where a set of binary features are acquired over time.

HyperTraPS will take a set of observed states, described by binary strings recording the presence/absence of each feature. It may optionally take initial states from which these states are reached, and information on the timings associated with each observation. It returns various summaries of which feature are acquired and when, and how these features influence each other.

Loading the software

If we just want the HyperTraPS code without other dependencies, we only need Rcpp, and can use the following

library(Rcpp)
sourceCpp("hypertraps-r.cpp")

If we want various helper functions and ggplot functions in addition, use this

source("hypertraps.R")
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union

Simple demo

Here we’ll construct a simple synthetic dataset. The m.1 matrix will store a set of initial states, and the m.2 matrix will store a set of later observed states. The first row of m.1, for example, stores the state 00000, where no features have been acquired. The first row of m.2 stores 10000, where the first of five features has been acquired.

The times correspond to each observation, so the transition 00000->10000 described by those first rows has an associated time of 0.1 (in whatever units we are working with).

m.1 = matrix(rep(c(0,0,0,0,0,
               1,0,0,0,0,
               1,1,0,0,0,
               1,1,1,0,0,
               1,1,1,1,0,
               0,0,0,0,0,
               0,0,0,0,1,
               0,0,0,1,1,
               0,0,1,1,1,
               0,1,1,1,1),5), byrow=TRUE, ncol=5)
m.2 = matrix(rep(c(1,0,0,0,0,
               1,1,0,0,0,
               1,1,1,0,0,
               1,1,1,1,0,
               1,1,1,1,1,
               0,0,0,0,1,
               0,0,0,1,1,
               0,0,1,1,1,
               0,1,1,1,1,
               1,1,1,1,1),5), byrow=TRUE, ncol=5)
times = rep(c(0.1, 0.2, 0.3, 0.4, 0.5), 10)

Let’s run HyperTraPS with these “before” and “after” observations. By using times as both the start and end time arguments, we say that each transition takes precisely that associated time – we could allow broader time windows to capture uncertainty in timing. Finally, we provide labels for the five individual features involved.

my.post = HyperTraPS(m.2, initialstates_arg = m.1, 
                     starttimes_arg = times, endtimes_arg = times,
                     featurenames_arg = c("A", "B", "C", "D", "E")) 
## 
## HyperTraPS(-CT)
## Sep 2023
## 
## Unpublished code -- please do not circulate!
## Published version available at:
##     https://github.com/StochasticBiology/HyperTraPS
## with stripped-down version at:
##     https://github.com/StochasticBiology/hypertraps-simple
## 
## Running HyperTraPS-CT with:
## [observations-file]: rcpp
## [start-timings-file]: 
## [end-timings-file]: 
## [random number seed]: 1
## [length index]: 3
## [kernel index]: 5
## [walkers]: 200
## [losses (1) or gains (0)]: 0
## [APM]: 0
## [model]: 2
## 
## Using MH MCMC
## Number of features is 5, I found 50 observation pairs and 50 timing pairs
## 
## Starting with simple initial param guess
## One likelihood estimation took 1.150000e-03 seconds.
## Initial likelihood is -9.033188e+01
## Second guess is -9.033188e+01
## This code (1000 steps) will probably take around 1.150 seconds (0.000 hours) to complete.
## 
## 0 - Iteration 0 likelihood -90.331879 total-acceptance 0.000000 recent-acceptance 0.000000 trial-likelihood -99.386252
## 100 - Iteration 100 likelihood -66.809091 total-acceptance 0.207921 recent-acceptance 0.210000 trial-likelihood -73.963469
## 200 - Iteration 200 likelihood -55.969824 total-acceptance 0.218905 recent-acceptance 0.230000 trial-likelihood -60.264649
## 300 - Iteration 300 likelihood -51.245479 total-acceptance 0.199336 recent-acceptance 0.160000 trial-likelihood -57.876586
## 400 - Iteration 400 likelihood -53.015445 total-acceptance 0.214464 recent-acceptance 0.260000 trial-likelihood -57.437495
## 500 - Iteration 500 likelihood -55.544181 total-acceptance 0.211577 recent-acceptance 0.200000 trial-likelihood -56.438726
## 600 - Iteration 600 likelihood -50.854181 total-acceptance 0.212978 recent-acceptance 0.220000 trial-likelihood -54.143414
## 700 - Iteration 700 likelihood -51.469557 total-acceptance 0.206847 recent-acceptance 0.170000 trial-likelihood -60.168949
## 800 - Iteration 800 likelihood -49.285480 total-acceptance 0.205993 recent-acceptance 0.200000 trial-likelihood -49.285480
## 900 - Iteration 900 likelihood -50.881319 total-acceptance 0.216426 recent-acceptance 0.300000 trial-likelihood -50.881319
## 
## HyperTraPS(-CT) posterior analysis
## 
## Verbose flag is 0
## Bin scale is 10.000000
## Using posterior samples with 7 x 25 entries
## Based on rcpp with 25 params per model and model 2, there are 5 features
## Output label is rcpp
## allruns is 70
## 0 1.524714
## 1 2.030571
## 2 2.138429
## 3 2.353143
## 4 1.953143

That output takes us through the HyperTraPS process. First, the arguments provided to the function call are described, then the algorithm chosen (MH MCMC) and a summary of the input data is given. To estimate runtime, HyperTraPS reports the time taken for a single likelihood calculation, then scales this by the estimated number of calculations required to give a runtime estimate. Here this is only 1.17 seconds, but for chains long enough for satisfactory convergence (and more complicated systems) this may be dramatically longer.

Because we didn’t turn off output, HyperTraPS periodically outputs information about the run as it goes. Every 100th step, it outputs: the likelihood of the current parameterisation; the step acceptance rate through the whole run; the step acceptance rate since the last output; the likelihood of the most recent proposal. These can help design efficient MCMC approaches: if the acceptance rate is too low and/or the recent proposal likelihood is much lower than the current, consider a smaller perturbation kernel. It also helps us see when the chain is “burned in” – when the likelihood fluctuates, rather than consistently increasing, we’re probably more stable.

After the MCMC run, the posterior analysis begins, and outputs a few details. These include the size of the posterior sample set being explored, the model being used, and a quick summary of the mean acquisition orderings of each feature. These are really just checks for debugging; not much useful information can be seen from them.

Now – visualising the results.

Transition graph plot. This plot shows a set of sampled routes on the hypercubic space. Each edge is a transition, corresponding to the acquisition of a new feature. The edges are labelled by the feature acquired and the statistics of the time taken (mean +- s.d.). The edge width gives the probability flux through that transition.

plotHypercube.sampledgraph2(my.post)
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Bubble plot. This plot summarises the inferred dynamics of features, forgetting specific states. The size of a circle at point x,y gives the probability that feature y is acquired at ordinal time x.

plotHypercube.bubbles(my.post)

Timing histograms. Histograms of acquisition time for each feature, and probability that each feature is not acquired within a given time threshold (here, 20).

plotHypercube.timehists(my.post)

Time series plot. Each step up the vertical axis corresponds to the acquisition of a new feature. The horizontal axis gives the time of the corresponding event, and the colour labels the feature that was acquired.

plotHypercube.timeseries(my.post)
## Warning: Transformation introduced infinite values in continuous x-axis

Input/output between R data structure and file format

We can write the output of HyperTraPS to a set of files for storage and later retrieval

writeHyperinf(my.post, "simpledemo", my.post$L, postlabel = "simpledemo", fulloutput=TRUE)

We can retrieve output from files, which may have been previously written as above, or may have come from running HyperTraPS at the command line.

my.post.r = readHyperinf("simpledemo", postlabel = "simpledemo", fulloutput=TRUE)

Other functional examples

HyperTraPS allows substantial flexibility in how evolutionary pathways are inferred, how likelihoods are estimated, and other aspects of the approach.

If we want to sacrifice some accuracy in estimating likelihood for computational speed, we can run an example with fewer walkers sampling pathways (walkers_arg)

my.post.sparse = HyperTraPS(m.2, initialstates_arg = m.1, 
                            starttimes_arg = times, featurenames_arg = c("A", "B", "C", "D", "E"), 
                            walkers_arg = 2,
                            limited_output_arg = 1)
## Start timings, but not end timings, specified. Assuming t = inf ends.
## One likelihood estimation took 4.000000e-05 seconds.
## Initial likelihood is -7.482955e+01
## Second guess is -7.482955e+01
## This code (1000 steps) will probably take around 0.040 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -

We can ask HyperTraPS to perform stepwise regularisation after fitting a model, pruning extraneous parameters (regularise_arg). The regularisation plot shows how the information criterion behaves as we prune back parameters – the minimum gives us our optimal model.

my.post.regularise = HyperTraPS(m.2, initialstates_arg = m.1, regularise_arg = 1,
                                walkers_arg = 20,
                                limited_output_arg = 1)
## One likelihood estimation took 1.240000e-04 seconds.
## Initial likelihood is -4.787492e+01
## Second guess is -4.787492e+01
## This code (1000 steps) will probably take around 0.124 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.regularisation(my.post.regularise)

plotHypercube.summary(my.post.regularise)

We can use simulated annealing to get a maximum likelihood estimate rather than a Bayesian picture (sa_arg)

my.post.sa = HyperTraPS(m.2, initialstates_arg = m.1, sa_arg = 1,
                        limited_output_arg = 1)
## One likelihood estimation took 1.014000e-03 seconds.
## Initial likelihood is -4.787492e+01
## Second guess is -4.787492e+01
## This code (1000 steps) will probably take around 1.014 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.summary(my.post.sa)

We can also use “phenotype landscape inference” – an unbiased sampling method, unlike the (corrected) bias sampling in HyperTraPS (PLI_arg). This takes longer but is more flexible with uncertain data

my.post.pli = HyperTraPS(m.2, initialstates_arg = m.1, PLI_arg = 1,
                         limited_output_arg = 1)
## One likelihood estimation took 6.447000e-03 seconds.
## Initial likelihood is -2.912900e+02
## Second guess is -2.906914e+02
## This code (1000 steps) will probably take around 6.447 seconds (0.002 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.summary(my.post.pli)

HyperTraPS supports different parameter structures (model_arg). By default we use an “L2” parameterisation, where each feature can individually influence the acquisition of every other feature. “L1” has features completely independent; “L0” has all features identical. “L3” allows pairs of features to influence each feature’s acquisition; “L4” allows triplets of features to influence each feature’s acquisition. This approach, labelled -1, allows every edge on the hypercube to have its own independent parameters – corresponding to the most flexible possible picture, where arbitrary sets of features influence other features. We then regularise as above (regularise_arg) to remove extraneous parameters

my.post.bigmodel.regularise = HyperTraPS(m.2, initialstates_arg = m.1, model_arg = -1,
                                         regularise_arg = 1, walkers_arg = 20,
                                         limited_output_arg = 1)
## One likelihood estimation took 1.070000e-04 seconds.
## Initial likelihood is -4.787492e+01
## Second guess is -4.787492e+01
## This code (1000 steps) will probably take around 0.107 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.regularisation(my.post.bigmodel.regularise)

Here’s another example of model choice. The data is now generated by a process where pairs of features influence the acquisition of other features. To capture these interactions, an “L^2” picture (model_arg = 2, where each feature individually influences each other feature) is insufficient – an “L^3” picture (model_arg = 3) allowing pair influence is needed. The full parameterisation, where every edge on the hypercube transition network (model_arg = -1) has an independent parameter, can also capture this behaviour.

logic.mat = readLines("Verify/hi-order.txt")
logic.mat = do.call(rbind, lapply(strsplit(logic.mat, ""), as.numeric))
logic.starts = logic.mat[seq(from=1, to=nrow(logic.mat), by=2),]
logic.ends = logic.mat[seq(from=2, to=nrow(logic.mat), by=2),]
logic.post.m1 = HyperTraPS(logic.ends, initialstates_arg = logic.starts, length_index_arg = 4, model_arg = -1, walkers_arg = 20, limited_output_arg = 1)
## One likelihood estimation took 3.200000e-05 seconds.
## Initial likelihood is -1.627202e+01
## Second guess is -1.627202e+01
## This code (10000 steps) will probably take around 0.320 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 - 1000 - 1100 - 1200 - 1300 - 1400 - 1500 - 1600 - 1700 - 1800 - 1900 - 2000 - 2100 - 2200 - 2300 - 2400 - 2500 - 2600 - 2700 - 2800 - 2900 - 3000 - 3100 - 3200 - 3300 - 3400 - 3500 - 3600 - 3700 - 3800 - 3900 - 4000 - 4100 - 4200 - 4300 - 4400 - 4500 - 4600 - 4700 - 4800 - 4900 - 5000 - 5100 - 5200 - 5300 - 5400 - 5500 - 5600 - 5700 - 5800 - 5900 - 6000 - 6100 - 6200 - 6300 - 6400 - 6500 - 6600 - 6700 - 6800 - 6900 - 7000 - 7100 - 7200 - 7300 - 7400 - 7500 - 7600 - 7700 - 7800 - 7900 - 8000 - 8100 - 8200 - 8300 - 8400 - 8500 - 8600 - 8700 - 8800 - 8900 - 9000 - 9100 - 9200 - 9300 - 9400 - 9500 - 9600 - 9700 - 9800 - 9900 -
logic.post.1 = HyperTraPS(logic.ends, initialstates_arg = logic.starts, length_index_arg = 4, model_arg = 1, walkers_arg = 20, limited_output_arg = 1)
## One likelihood estimation took 2.700000e-05 seconds.
## Initial likelihood is -1.627202e+01
## Second guess is -1.627202e+01
## This code (10000 steps) will probably take around 0.270 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 - 1000 - 1100 - 1200 - 1300 - 1400 - 1500 - 1600 - 1700 - 1800 - 1900 - 2000 - 2100 - 2200 - 2300 - 2400 - 2500 - 2600 - 2700 - 2800 - 2900 - 3000 - 3100 - 3200 - 3300 - 3400 - 3500 - 3600 - 3700 - 3800 - 3900 - 4000 - 4100 - 4200 - 4300 - 4400 - 4500 - 4600 - 4700 - 4800 - 4900 - 5000 - 5100 - 5200 - 5300 - 5400 - 5500 - 5600 - 5700 - 5800 - 5900 - 6000 - 6100 - 6200 - 6300 - 6400 - 6500 - 6600 - 6700 - 6800 - 6900 - 7000 - 7100 - 7200 - 7300 - 7400 - 7500 - 7600 - 7700 - 7800 - 7900 - 8000 - 8100 - 8200 - 8300 - 8400 - 8500 - 8600 - 8700 - 8800 - 8900 - 9000 - 9100 - 9200 - 9300 - 9400 - 9500 - 9600 - 9700 - 9800 - 9900 -
logic.post.2 = HyperTraPS(logic.ends, initialstates_arg = logic.starts, length_index_arg = 4, model_arg = 2, walkers_arg = 20, limited_output_arg = 1)
## One likelihood estimation took 3.600000e-05 seconds.
## Initial likelihood is -1.627202e+01
## Second guess is -1.627202e+01
## This code (10000 steps) will probably take around 0.360 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 - 1000 - 1100 - 1200 - 1300 - 1400 - 1500 - 1600 - 1700 - 1800 - 1900 - 2000 - 2100 - 2200 - 2300 - 2400 - 2500 - 2600 - 2700 - 2800 - 2900 - 3000 - 3100 - 3200 - 3300 - 3400 - 3500 - 3600 - 3700 - 3800 - 3900 - 4000 - 4100 - 4200 - 4300 - 4400 - 4500 - 4600 - 4700 - 4800 - 4900 - 5000 - 5100 - 5200 - 5300 - 5400 - 5500 - 5600 - 5700 - 5800 - 5900 - 6000 - 6100 - 6200 - 6300 - 6400 - 6500 - 6600 - 6700 - 6800 - 6900 - 7000 - 7100 - 7200 - 7300 - 7400 - 7500 - 7600 - 7700 - 7800 - 7900 - 8000 - 8100 - 8200 - 8300 - 8400 - 8500 - 8600 - 8700 - 8800 - 8900 - 9000 - 9100 - 9200 - 9300 - 9400 - 9500 - 9600 - 9700 - 9800 - 9900 -
logic.post.3 = HyperTraPS(logic.ends, initialstates_arg = logic.starts, length_index_arg = 4, model_arg = 3, walkers_arg = 20, limited_output_arg = 1)
## One likelihood estimation took 5.400000e-05 seconds.
## Initial likelihood is -1.627202e+01
## Second guess is -1.627202e+01
## This code (10000 steps) will probably take around 0.540 seconds (0.000 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 - 1000 - 1100 - 1200 - 1300 - 1400 - 1500 - 1600 - 1700 - 1800 - 1900 - 2000 - 2100 - 2200 - 2300 - 2400 - 2500 - 2600 - 2700 - 2800 - 2900 - 3000 - 3100 - 3200 - 3300 - 3400 - 3500 - 3600 - 3700 - 3800 - 3900 - 4000 - 4100 - 4200 - 4300 - 4400 - 4500 - 4600 - 4700 - 4800 - 4900 - 5000 - 5100 - 5200 - 5300 - 5400 - 5500 - 5600 - 5700 - 5800 - 5900 - 6000 - 6100 - 6200 - 6300 - 6400 - 6500 - 6600 - 6700 - 6800 - 6900 - 7000 - 7100 - 7200 - 7300 - 7400 - 7500 - 7600 - 7700 - 7800 - 7900 - 8000 - 8100 - 8200 - 8300 - 8400 - 8500 - 8600 - 8700 - 8800 - 8900 - 9000 - 9100 - 9200 - 9300 - 9400 - 9500 - 9600 - 9700 - 9800 - 9900 -
ggarrange(plotHypercube.graph(logic.post.m1) + ggtitle("All edges") + theme(legend.position="none"),
          plotHypercube.graph(logic.post.1) + ggtitle("L") + theme(legend.position="none"),
          plotHypercube.graph(logic.post.2)+ ggtitle("L^2") + theme(legend.position="none"),
          plotHypercube.graph(logic.post.3)+ ggtitle("L^3") + theme(legend.position="none"))

We can see that the structures inferred by the “full” and “L^3” models are the same, while the “L^2” (and inappropriate “L^1”) will either introduce extraneous transitions or fail to capture the true ones.

Scientific examples

A set of short-form examples from past studies – these should run in a few minutes and give approximations to the original results. In each case we read the observed states from a file (these are present in different formats, so different curation steps are involved in each case), and feature labels from another file. Then we run HyperTraPS and plot some summaries of the output.

  1. Ovarian cancer case study: traits are chromosomal aberrations, observations are independent patient samples.
cgh.mat = readLines("RawData/ovarian.txt")
cgh.mat = do.call(rbind, lapply(strsplit(cgh.mat, ""), as.numeric))
cgh.names = as.vector(read.table("RawData/ovarian-names.txt", sep=","))[[1]]

my.post.cgh = HyperTraPS(cgh.mat, 
                        length_index_arg = 3, outputinput_arg = 1, 
                        featurenames_arg = cgh.names,
                        limited_output_arg = 1) 
## Observed transitions:
## 0: 0000000 -> 0000010
## 1: 0000000 -> 0001010
## 2: 0000000 -> 1111000
## 3: 0000000 -> 1000000
## 4: 0000000 -> 0000000
## 5: 0000000 -> 1111100
## 6: 0000000 -> 1111101
## 7: 0000000 -> 0110000
## 8: 0000000 -> 0100001
## 9: 0000000 -> 1011111
## 10: 0000000 -> 0000000
## 11: 0000000 -> 1111101
## 12: 0000000 -> 1111111
## 13: 0000000 -> 1110011
## 14: 0000000 -> 1000000
## 15: 0000000 -> 0000000
## 16: 0000000 -> 1100011
## 17: 0000000 -> 1101010
## 18: 0000000 -> 1011010
## 19: 0000000 -> 1000110
## 20: 0000000 -> 1011111
## 21: 0000000 -> 1100111
## 22: 0000000 -> 1111110
## 23: 0000000 -> 1010101
## 24: 0000000 -> 0000000
## 25: 0000000 -> 0010000
## 26: 0000000 -> 1111101
## 27: 0000000 -> 1101101
## 28: 0000000 -> 1111101
## 29: 0000000 -> 0110100
## 30: 0000000 -> 1111101
## 31: 0000000 -> 1100111
## 32: 0000000 -> 1011101
## 33: 0000000 -> 1111100
## 34: 0000000 -> 1110111
## 35: 0000000 -> 1001101
## 36: 0000000 -> 1001110
## 37: 0000000 -> 1110111
## 38: 0000000 -> 1111101
## 39: 0000000 -> 1000000
## 40: 0000000 -> 1101010
## 41: 0000000 -> 1111110
## 42: 0000000 -> 0011011
## 43: 0000000 -> 1111000
## 44: 0000000 -> 1111101
## 45: 0000000 -> 0001010
## 46: 0000000 -> 0000000
## 47: 0000000 -> 1011100
## 48: 0000000 -> 0000010
## 49: 0000000 -> 1100000
## 50: 0000000 -> 1111010
## 51: 0000000 -> 1100010
## 52: 0000000 -> 1100000
## 53: 0000000 -> 1110110
## 54: 0000000 -> 1000000
## 55: 0000000 -> 1110110
## 56: 0000000 -> 1101101
## 57: 0000000 -> 0011011
## 58: 0000000 -> 1000000
## 59: 0000000 -> 0000001
## 60: 0000000 -> 1100000
## 61: 0000000 -> 1100010
## 62: 0000000 -> 0011111
## 63: 0000000 -> 1011101
## 64: 0000000 -> 0010001
## 65: 0000000 -> 1100010
## 66: 0000000 -> 0100010
## 67: 0000000 -> 1111101
## 68: 0000000 -> 1111100
## 69: 0000000 -> 0011111
## 70: 0000000 -> 1011110
## 71: 0000000 -> 0011100
## 72: 0000000 -> 1000000
## 73: 0000000 -> 1011100
## 74: 0000000 -> 1110101
## 75: 0000000 -> 1001101
## 76: 0000000 -> 0000000
## 77: 0000000 -> 0101011
## 78: 0000000 -> 1100111
## 79: 0000000 -> 1111011
## 80: 0000000 -> 0111011
## 81: 0000000 -> 0000010
## 82: 0000000 -> 1111101
## 83: 0000000 -> 1100000
## 84: 0000000 -> 1111010
## 85: 0000000 -> 0010000
## 86: 0000000 -> 1100000
## (where 1 is presence)
## 
## One likelihood estimation took 9.886000e-03 seconds.
## Initial likelihood is -3.213415e+02
## Second guess is -3.213415e+02
## This code (1000 steps) will probably take around 9.886 seconds (0.003 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
ggarrange(plotHypercube.lik.trace(my.post.cgh), 
          plotHypercube.bubbles(my.post.cgh, reorder=TRUE), 
          plotHypercube.sampledgraph2(my.post.cgh, no.times=TRUE), nrow=3)

plotHypercube.sampledgraph2(my.post.cgh, no.times=TRUE)

  1. C4 photosynthesis case study: traits are physical/genetic features associated with C4, observations are (incomplete) phylogenetically independent intermediate species.

This case study involves some uncertain data, which are present as “2”s in the source data – labelling features which may be 0 or 1.

c4.mat = as.matrix(read.table("RawData/c4-curated.csv", sep=","))
c4.names = as.vector(read.table("RawData/c4-trait-names.txt", sep=","))[[1]]

my.post.c4 = HyperTraPS(c4.mat, 
                        length_index_arg = 3, 
                        losses_arg = 1,
                        featurenames_arg = c4.names,
                        limited_output_arg = 1) 
## One likelihood estimation took 3.997600e-02 seconds.
## Initial likelihood is -2.694082e+02
## Second guess is -2.695834e+02
## This code (1000 steps) will probably take around 39.976 seconds (0.011 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.bubbles(my.post.c4, reorder=TRUE)

  1. Severe malaria disease progression case study: traits are clinical features, observations are (incomplete) independent patient presentations
malaria.df = read.csv("RawData/jallow_dataset_binary_with2s.csv")
malaria.mat = as.matrix(malaria.df[,2:ncol(malaria.df)])
malaria.names = as.vector(read.table("RawData/malaria-names.txt", sep=","))[[1]]

my.post.malaria = HyperTraPS(malaria.mat, 
                        length_index_arg = 3,
                        kernel_index_arg = 2,
                        walkers_arg = 20,
                        featurenames_arg = malaria.names,
                        limited_output_arg = 1) 
## One likelihood estimation took 1.508836e+00 seconds.
## Initial likelihood is -7.672008e+04
## Second guess is -7.671213e+04
## This code (1000 steps) will probably take around 1508.836 seconds (0.419 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.bubbles(my.post.malaria, reorder=TRUE, transpose=TRUE)

  1. Tool use evolution case study: traits are modes of tool use, observations are phylogenetically coupled species observations (phylogeny has been accounted for, giving transition pairs)
tools.mat = as.matrix(read.table("RawData/total-observations.txt-trans.txt"))
tools.names = as.vector(read.table("RawData/tools-names.txt"))[[1]]
tools.starts = tools.mat[seq(from=1, to=nrow(tools.mat), by=2),]
tools.ends = tools.mat[seq(from=2, to=nrow(tools.mat), by=2),]

my.post.tools = HyperTraPS(tools.ends, initialstates_arg = tools.starts, 
                           length_index_arg = 3, 
                           featurenames_arg = tools.names,
                           limited_output_arg = 1) 
## One likelihood estimation took 1.469200e-02 seconds.
## Initial likelihood is -1.811798e+02
## Second guess is -1.811798e+02
## This code (1000 steps) will probably take around 14.692 seconds (0.004 hours) to complete.
## 
## 0 - 100 - 200 - 300 - 400 - 500 - 600 - 700 - 800 - 900 -
plotHypercube.bubbles(my.post.tools, reorder=TRUE, transpose=TRUE)

plotHypercube.sampledgraph2(my.post.tools, node.labels = FALSE, max=100, no.times=TRUE) + theme(legend.position = "none")